Verbal Roots in the Sanskrit Wordnet

نویسندگان

  • Malhar Kulkarni
  • Pushpak Bhattacharyya
چکیده

Wordnets (WN) are accepted worldwide as useful lexical tools for Natural Language Processing (NLP) . Projects for building WNs of different languages of the world are going for quite some time. The scenario for Indian Languages is also encouraging. Indian Institute of Technology Bombay (IITB) has successfully created WNs for Hindi and Marathi.There have been more than 100,000 hits of the sites for these resources. The importance of developing a Sanskrit WN (SWN), in the context of Indian Languages (ILs) cannot be over-emphasised. Languages in India are broadly categorized into three families, one of which namely, Indo-European, has Sanskrit as a major language historically. Many modern Indian Languages like Hindi, Marathi, Bengali, Gujrathi, Panjabi, Oriya etc. have substantial number of borrowed Sanskrit words. Even the grammars of these languages have categories of words called tadbhava (generated from Sanskrit) and tatsama(similar to Sanskrit). SWN, it follows, can logically provide a natural platform for integrating IL WNs. Several institutes and scholars have been trying to undertake the task of building SWN with various strategies. Not much of substance, however, is visible on this front. The main issue regarding the structure of SWN that comes up at the time of discussion is that while building the SWN, traditional knowledge bases (śastric knowledge) should be used, and one should not blindly follow structures of existing WNs which are based on western concepts. It is this particular aspect that is aimed at studying in the present paper.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Automatic Extension of Sanskrit Wordnet using Bilingual Dictionary

In this paper, we report our methods and results of using, for the first time, semi-automatic approach to enhance an Indian language Wordnet. We apply our methods to enhancing an already existing Sanskrit Wordnet created from Hindi Wordnet (which is created from Princeton Wordnet) using expansion approach. We base our experiment on an existing bilingual Sanskrit English Dictionary and show how ...

متن کامل

Coarse Semantic Classification of Rare Nouns Using Cross-Lingual Data and Recurrent Neural Networks

The paper presents a method for WordNet supersense tagging of Sanskrit, an ancient Indian language with a corpus grown over four millenia. The proposed method merges lexical information from Sanskrit texts with lexicographic definitions from Sanskrit-English dictionaries, and compares the performance of two machine learning methods for this task. Evaluation concentrates on Vedic, the oldest lay...

متن کامل

Introduction to Synskarta: An Online Interface for Synset Creation with Special Reference to Sanskrit

WordNet is a large lexical resource expressing distinct concepts in a language. Synset is a basic building block of the WordNet. In this paper, we introduce a web based lexicographer's interface ‘Synskarta’ which is developed to create synsets from source language to target language with special reference to Sanskrit WordNet. We focus on introduction and implementation of Synskarta and how it c...

متن کامل

Introduction to Gujarati wordnet

Gujarati is one of the 22 official languages of India. It is an Indo-Aryan language descended from Sanskrit. Gujarati wordnet is being built using expansion approach with Hindi as the source language. This paper describes experiences of building Gujarati wordnet. Paper discusses basic features of Gujarati language and evaluates suitability of Hindi language for expansion approach. Various issue...

متن کامل

Analysis of Sanskrit Text: Parsing and Semantic Relations

In this paper, we are presenting our work towards building a dependency parser for Sanskrit language that uses deterministic finite automata(DFA) for morphological analysis and ’utsarga apavaada’ approach for relation analysis. A computational grammar based on the framework of Panini is being developed. A linguistic generalization for Verbal and Nominal database has been made and declensions ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008